TIKA-4679: Add HTTP/2 support to tika-server via Jetty http2-server#2672
TIKA-4679: Add HTTP/2 support to tika-server via Jetty http2-server#2672nddipiazza wants to merge 5 commits intomainfrom
Conversation
- Add tika-e2e-tests/tika-server module with TikaServerHttp2Test - Test starts the real fat-jar and verifies HTTP/2 (h2c) responses via Java HttpClient configured with Version.HTTP_2 - Wire module into tika-e2e-tests/pom.xml modules list - Module is skipped by default; enable with -Pe2e profile Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
I think we're good with Java 17. |
…h-check - Add Assumptions.assumeTrue(jar.exists()) so tests skip gracefully when tika-server-standard fat-jar hasn't been built (CI without prior install) - Change startup health-check from / to /status (more reliable 200 response) - Increase startup timeout to 90s for slower CI environments Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds HTTP/2 (h2c cleartext) support to tika-server by adding the org.eclipse.jetty.http2:http2-server jar as a dependency. CXF's Jetty transport automatically detects this jar on the classpath and enables h2c negotiation alongside HTTP/1.1 on the existing port. No application code changes are needed — just the dependency addition.
Changes:
- Added
http2-serverto the parent BOM dependency management and as a dependency intika-server-core - Added a unit test (
testH2c) inTikaServerIntegrationTestverifying HTTP/2 negotiation - Added a new
tika-e2e-tests/tika-servermodule with end-to-end tests that start the actual fat-jar and validate HTTP/2 (h2c) on both status and parse endpoints
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tika-parent/pom.xml | Adds http2-server artifact to the dependency management block using ${jetty.http2.version} |
| tika-server/tika-server-core/pom.xml | Adds http2-server as a compile dependency (version inherited from parent BOM) |
| TikaServerIntegrationTest.java | Adds testH2c() unit test using Java's HttpClient to verify HTTP/2 negotiation |
| tika-e2e-tests/pom.xml | Registers the new tika-server e2e test module |
| tika-e2e-tests/tika-server/pom.xml | New e2e module POM with surefire skip-by-default and -Pe2e profile activation |
| TikaServerHttp2Test.java | New e2e test class that starts the fat-jar process and validates h2c on status and parse endpoints |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tika-e2e-tests/tika-server/src/test/java/org/apache/tika/server/e2e/TikaServerHttp2Test.java
Outdated
Show resolved
Hide resolved
tika-e2e-tests/tika-server/src/test/java/org/apache/tika/server/e2e/TikaServerHttp2Test.java
Show resolved
Hide resolved
- Use tika-server-standard assembly zip (unpacked via dependency plugin) instead of thin jar, so the required lib/ dependencies are available - Health-check endpoint changed from /status to / (root always returns 200; /status requires explicit endpoint config to be enabled) - Pre-negotiate h2c before PUT /tika parse test: h2c Upgrade requires a no-body request first; GET / establishes the HTTP/2 connection so the subsequent PUT reuses it correctly - Drop --noFork flag (TikaServerCli does not recognize it; server runs its own fork management independently) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused moduleDir variable; initialize repoRoot directly - stopServer() now uses waitFor(5s) + destroyForcibly() + waitFor(30s) to avoid indefinite blocking if SIGTERM doesn't terminate the process Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed both Copilot review comments in commit 30b9ff3:
|
|
@tballison mind me merging this? |
|
+1 I don't like adding this without more motivation, but I couldn't find any way that it causes security problems. ¯_(ツ)_/¯ |
|
@tballison from community member who created jira: "The main motivation is that Google Cloud Run limits request sizes to 32 mb on HTTP1.1, but has no hard cap with HTTP2. (Containers inside Google Cloud Run run without HTTPS.)" |
|
LOL, y, I guess? |
Summary
Adds HTTP/2 (h2c cleartext) support to tika-server by including the
org.eclipse.jetty.http2:http2-serverjar on the classpath. When this jar is present, CXF's Jetty transport automatically negotiates HTTP/2 alongside HTTP/1.1 on the existing port (default 9998). Existing HTTP/1.1 clients are completely unaffected.This implements TIKA-4679. The core dependency change was originally contributed by Lawrence Moorehead (@elemdisc) — see elemdisc/tika PR#1 — and is cherry-picked here with full author credit.
Changes
tika-parent/pom.xml
http2-serverto the dependency management block alongside the existinghttp2-hpack,http2-client,http2-commonentries (all at${jetty.http2.version})tika-server/tika-server-core/pom.xml (Lawrence Moorehead's commit)
org.eclipse.jetty.http2:http2-serverruntime dependency (version from parent BOM)tika-server/tika-server-core/src/test/.../TikaServerIntegrationTest.java (Lawrence Moorehead's commit)
testH2c()unit test that sends a request viaHttpClient.Version.HTTP_2and asserts the response was served over HTTP/2tika-e2e-tests/tika-server/ (new module)
-Pe2etika-e2e-tests/pom.xmlHow it works
Adding
http2-serverto the classpath is sufficient for h2c (HTTP/2 cleartext) support. CXF'sJettyHTTPServerEngineFactorydetects the jar at startup and wires inHTTP2CServerConnectionFactory. No startup code changes are required.For h2 over TLS (recommended for production), configure
TlsConfigintika-server.json. Java 17's built-in ALPN handles protocol negotiation automatically — no separate ALPN agent is needed.Port management
EXPOSE 9998and health-check are unchangedShutdown note
HTTP/2 multiplexes multiple requests over a single TCP connection. The current
shutdownNow()path does not send a GOAWAY frame before closing. Under moderate load this is acceptable for h2c, but a future improvement could add a drain timeout for graceful HTTP/2 shutdown.Backward compatibility
Purely additive classpath change:
Testing Instructions
Manually with curl (after starting the server):
Review Checklist
http2-serverversion comes from${jetty.http2.version}in parent BOM (not hardcoded)TikaServerIntegrationTest#testH2cpasses-Pe2ePotential Concerns
jetty-alpn-java-serverdependency may be needed depending on the Jetty version and JVM. This can be addressed in a follow-up.http2-serverjar adds ~500 KB totika-server-standard. This also increases theapache/tikaDocker image slightly.